30 research outputs found

    Physics inspired methods for crowd video surveillance and analysis: a survey

    Get PDF

    Characterizing Mechanisms for Factual Recall in Language Models

    Full text link
    Language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in such situations. Specifically, we measure the proportion of the time an LM will use a counterfactual prefix (e.g., "The capital of Poland is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia and GPT2, the training frequency of both the query country ("Poland") and the in-context city ("London") highly affect the models' likelihood of using the counterfactual. We then use head attribution to identify individual attention heads that either promote the memorized answer or the in-context answer in the logits. By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data. This method can increase the rate of generating the in-context answer to 88\% of the time simply by scaling a single head at runtime. Our work contributes to a body of evidence showing that we can often localize model behaviors to specific components and provides a proof of concept for how future methods might control model behavior dynamically at runtime

    Are Language Models Worse than Humans at Following Prompts? It's Complicated

    Full text link
    Prompts have been the center of progress in advancing language models' zero-shot and few-shot performance. However, recent work finds that models can perform surprisingly well when given intentionally irrelevant or misleading prompts. Such results may be interpreted as evidence that model behavior is not "human like". In this study, we challenge a central assumption in such work: that humans would perform badly when given pathological instructions. We find that humans are able to reliably ignore irrelevant instructions and thus, like models, perform well on the underlying task despite an apparent lack of signal regarding the task they are being asked to do. However, when given deliberately misleading instructions, humans follow the instructions faithfully, whereas models do not. Our findings caution that future research should not idealize human behaviors as a monolith and should not train or evaluate models to mimic assumptions about these behaviors without first validating humans' behaviors empirically.Comment: EMNLP 202

    Does CLIP Bind Concepts? Probing Compositionality in Large Image Models

    Full text link
    Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying ''red cube'' by reasoning over the constituents ''red'' and ''cube''. In this work, we focus on the ability of a large pretrained vision and language model (CLIP) to encode compositional concepts and to bind variables in a structure-sensitive way (e.g., differentiating ''cube behind sphere'' from ''sphere behind cube''). In order to inspect the performance of CLIP, we compare several architectures from research on compositional distributional semantics models (CDSMs), a line of research that attempts to implement traditional compositional linguistic structures within embedding spaces. We find that CLIP can compose concepts in a single-object setting, but in situations where concept binding is needed, performance drops dramatically. At the same time, CDSMs also perform poorly, with best performance at chance level

    Adverse drug events in Chinese elder inpatients: a retrospective review for evaluating the efficiency of the Global Trigger Tool

    Get PDF
    BackgroundElderly patients frequently experience a high incidence of adverse drug events (ADEs) due to the coexistence of multiple diseases, the combination of various medications, poor medication compliance, and other factors. Global Trigger Tool (GTT) is a new method for identifying ADEs, introducing the concept of a trigger, that is, clues including abnormal laboratory values, reversal drugs, and clinical symptoms that may suggest ADEs, and specifically locating information related to ADEs in the medical record to identify ADEs. The aim of this study was to establish a GTT-based trigger tool for adverse medication events in elderly patients and to investigate the risk variables associated with such events.MethodsThe triggers were identified by reviewing the frequency of ADEs in elderly patients in Sichuan, China, retrieving relevant literature, and consulting experts. A retrospective analysis was carried out to identify adverse medication occurrences among 480 elderly inpatients in Sichuan People’s Hospital.ResultsA total of 56 ADEs were detected in 51 patients (10.62%), 13.04 per 1,000 patient days, and 11.67 per 100 admissions. The overall positive predictive value (PPV) of the triggers was 23.84, and 94.64% of ADEs caused temporary injury. Gastrointestinal system injury (27.87%) and metabolic and nutritional disorders (24.53%) were the primary organ-systems affected by ADEs. The majority of ADEs were caused by drugs used to treat cardiovascular diseases. 71.43% of ADE occurred within 2 days of administration and the risk factor analysis of ADE revealed that the number of medicines had a significant correlation.ConclusionThis study demonstrated GTT’s value as a tool for ADEs detection in elderly inpatients in China. It enhances the level of medication management and comprehensively reflects the situation of ADE of the elderly

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Fuzzy Evaluation of Crowd Safety Based on Pedestrians’ Number and Distribution Entropy

    No full text
    Crowd video monitoring and analysis is a hot topic in computer vision and public management. The pre-evaluation of crowd safety is beneficial to the prediction of crowd status to avoid the occurrence of catastrophic events. This paper proposes a method to evaluate crowd safety based on fuzzy inference. Pedestrian’s number and distribution uniformity are considered in a fuzzy inference system as two kinds of attributes of a crowd. Firstly, the pedestrian’s number is estimated by the number of foreground pixels. Then, the distribution uniformity of a crowd is calculated using distribution entropy by dividing the monitoring scene into several small areas. Furthermore, through the fuzzy operation, the fuzzy system is constructed by using two input variables (pedestrian’s number and distribution entropy) and one output variable (crowd safety status). Finally, inference rules between the crowd safety state and the pedestrian’s number and distribution uniformity are constructed to obtain the pre-evaluation of the safety state of the crowd. Three video sequences extracted from different scenes are used in the experiment. Experimental results show that the proposed method can be used to evaluate the safety status of the crowd in a monitoring scene

    Detection of Shoot Beetle Stress on Yunnan Pine Forest Using a Coupled LIBERTY2-INFORM Simulation

    No full text
    Yunnan pine shoot beetles (PSB), Tomicus yunnanensis and Tomicus minor have spread through southwestern China in the last five years, leading to millions of hectares of forest being damaged. Thus, there is an urgent need to develop an effective approach for accurate early warning and damage assessment of PSB outbreaks. Remote sensing is one of the most efficient methods for this purpose. Despite many studies existing on the mountain pine beetle (MPB), very little work has been undertaken on assessing PSB stress using remote sensing. The objective of this paper was to develop a spectral linear mixing model aided by radiative transfer (RT) and a new Yellow Index (YI) to simulate the reflectance of heterogeneous canopies containing damaged needles and quantitatively inverse their PSB stress. The YI, the fraction of dead needles, is a physically-explicit stress indicator that represents the plot shoots damage ratio (plot SDR). The major steps of this methods include: (1) LIBERTY2 was developed to simulate the reflectance of damaged needles using YI to linearly mix the green needle spectra with the dead needle spectra; (2) LIBERTY2 was coupled with the INFORM model to scale the needle spectra to the canopy scale; and (3) a look-up table (LUT) was created against Sentinel 2 (S2) imagery and inversed leaf chlorophyll content (LCC), green leaf area index (LAI) and plot SDR. The results show that (1) LIBERTY2 effectively simulated the reflectance spectral values on infested needles (mean relative error (MRE) = 1.4–18%), and the YI can indicate the degrees of needles damage; (2) the coupled LIBERTY2-INFORM model is suitable to estimate LAI (R2 = 0.73, RMSE = 0.17 m m−2, NRMSE = 11.41% and the index of agreement (IOA) = 0.92) and LCC (R2 = 0.49, RMSE = 56.24 mg m−2, NRMSE = 25.22% and IOA = 0.72), and is better than the original LIBERTY model (LAI: R2 = 0.38, RMSE = 0.43 m m−2, NRMSE = 28.85% and IOA = 0.68; LCC: R2 = 0.34, RMSE = 76.44 mg m−2, NRMSE = 34.23% and IOA = 0.57); and (3) the inversed YI is positively correlated with the measured plot SDR (R2 = 0.40, RMSE = 0.15). We conclude that the LIBERTY2 model improved the reflectance simulation accuracy of both the needles and canopies, making it suitable for assessing PSB stress. The YI has the potential to assess PSB damage

    A Low-Power ADPLL with Calibration-Free RO-Based Injection-Locking TDC for BLE Applications

    No full text
    This paper proposes a low-power all-digital phase-locked loop (ADPLL) with calibration-free ring oscillator (RO)-based injection-locking time to digital converter (TDC) for BLE applications. The RO is reused as the delay cell of TDC, and the quantization step of TDC is always tracked with the RO period; hence no calibration is needed in this architecture. We adopt RO tuning to lower the injection-locking bandwidth so as to decrease the power consumption of the injection current. Moreover, the fractional part of phase error detection is turned down in the coarse tuning of ADPLL to save power. An LC-based digital-controlled oscillator (LCDCO) with a 6.4 nH inductor and a resistive bias is used to have a low power and better phase noise performance. The ADPLL is fabricated in 40 nm CMOS with a 1 V supply and consumes 1.4 mW when it is locked. The measured phase noise is −114 dBc/Hz at 1 MHz offset. The test results show significant power saving. Thus, it can be a promising candidate for BLE applications
    corecore